#Direct Preference Optimization06/07/2025
Meta and NYU's Semi-Online Reinforcement Learning Enhances LLM Alignment Efficiency
Meta and NYU developed a semi-online reinforcement learning method that balances offline and online training to enhance large language model alignment, boosting performance in both instruction-based and mathematical tasks.